Searching and Querying Wide-Area Distributed Collections

نویسندگان

  • M. Franklin
  • G. Mihaila
  • L. Raschid
  • T. Urhan
  • M. E. Vidal
  • V. Zadorozhny
چکیده

The rapid proliferation of widely-distributed data and document collections raises the need for wrapper/mediator archi-tectures that can handle the challenges of wide area query processing. Traditional query and search techniques do not scale to large numbers of repositories and cannot cope with the unpredictable performance and (un)availability of access to such repositories. Research at the University of Maryland is aimed at addressing the following challenges: Query planning for wide area networks: We describe Web query optimization techniques that use a Web-Wrapper cost model (WCM) and WebPT-a tool to predict response times from WWW sources. Coping with unexpected delays: Query Scrambling is a reactive query execution scheme that adapts the query plan in response to runtime delays. XJoin is a small footprint, fully pipelinable join operator that automatically adjusts the ow of tuples during query execution. Planning with alternate sources: We investigate strategies for chosing among multiple alternative data sources, and techniques to adjust these decisions when severe delays are encountered. Source publishing and selection: We describe how sources can be published on the WWW using XML, and we investigate source selection using content and quality metadata. Acknowledgements: We are very grateful to Laura Bright and Tao Zhan for their research and programming support.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Metadata Infrastructure for Sound Recordings

This paper describes the first iteration of a working model for searching heterogeneous distributed metadata repositories for sound recording collections, focusing on techniques used for real-time querying and harmonizing diverse metadata models. The initial model for a metadata infrastructure presented here is the first of its kind for sound recordings.

متن کامل

Report on the TREC-8 Experiment: Searching on the Web and in Distributed Collections

The Internet paradigm permits information searches to be made across wide-area networks where information is contained in web pages and/or whole document collections such as digital libraries. These new distributed information environments reveal new and challenging problems for the IR community. Consequently, in this TREC experiment we investigated two questions related to information searches...

متن کامل

Distributed Data Streams

DEFINITION A majority of today’s data is constantly evolving and fundamentally distributed in nature. Data for almost any large-scale data-management task is continuously collected over a wide area, and at a much greater rate than ever before. Compared to traditional, centralized stream processing, querying such large-scale, evolving data collections poses new challenges, due mainly to the phys...

متن کامل

Toward sustainable publishing and querying of distributed Linked Data archives

Purpose This paper details a low-cost, low-maintenance publishing strategy aimed at unlocking the value of Linked Data collections held by libraries, archives and museums. Design/methodology/approach The shortcomings of commonly used Linked Data publishing approaches are identified, and the current lack of substantial collections of Linked Data exposed by libraries, archives and museums is cons...

متن کامل

Optimised Phrase Querying and Browsing of Large Text Databases

Most search systems for querying large document collections—for example, web search engines—are based on well-understood information retrieval principles. These systems are both efficient and effective in finding answers to many user information needs, expressed through informal ranked or structured Boolean queries. Phrase querying and browsing are additional techniques that can augment or repl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007